Introduction

Column

Background

Title: Socioeconomic and Demographic Impacts on Average Cancer Diagnosis and Deaths per Year in the US

Author: Alexa Neal

In this project, I wanted to explore how various non-biological factors are related to the average number of cancer cases diagnosed, and the number average cancer deaths in the US. One study found that individuals on Medicare with low incomes had greater challenges in affording their healthcare (Park et al. 2025), highlighting that factors other than genetics and biology can impact health outcomes. This also showcases how certain populations are vunerable to worse outcome due to lack of access to healthcare. Discerning these variables can be useful in understanding barriers to important medical resources and how to improve access.

Research Questions

To explore these concepts, I found two data sets on Kaggle, one containing health-related information and the other containing demographic information, and combined them. I utilized exploratory data analysis and multiple linear regression models to understand what factors correlate with average number of cancer cases diagnosed and the average number of cancer deaths per year. My research questions were:

  1. Which socioeconomic and demographic factors are most strongly associated with the average number of cancer diagnoses per year across U.S. counties?
  2. Which socioeconomic and demographic factors are most strongly associated with the average number of cancer deaths per year across U.S. counties?

Column

Variables of Interest

  • avganncount: Average number of cancer cases diagnosed annually

  • avgdeathsperyear: Average number of deaths due to cancer per year

  • medincome: Median income in the region

  • povertypercent: Percentage of population below the poverty line

  • pctprivatecoveragealone: Percentage of population covered by private health insurance alone

  • pctempprivcoverage: Percentage of population covered by employee-provided private health insurance

  • pctpubliccoveragealone: Percentage of population covered by public health insurance only

  • pctwhite: Percentage of White population

  • pctblack: Percentage of Black population

  • pctasian: Percentage of Asian population

Other Variables

  • target_deathrate: Target death rate due to cancer

  • incidencerate: Incidence rate of cancer

  • popest2015: Estimated population in 2015

  • studypercap: Per capita number of cancer-related clinical trials conducted

  • binnedinc: Binned median income

  • medianage: Median age in the region

  • pctpubliccoverage: Percentage of population covered by public health insurance

  • pctotherrace: Percentage of population belonging to other races

  • pctmarriedhouseholds: Percentage of married households

  • birthrate: Birth rate in the region

  • statefips: The FIPS code representing the state

  • countyfips: The FIPS code representing the county or census area within the state

  • avghouseholdsize: The average household size in the region

  • geography: The geographical location, typically represented as the county or census area name followed by the state name

Data Cleaning

Before performing any analysis, I used plot_intro() to visualize the data set and realized there were missing values. I utilized plot_missing() to see what columns these values were in. To clean up the data, I removed missing values from pctprivatecoveragealone and pctemployed16_over as they were variables of interest to me. I also completely removed pctsomecol18_24 due to the large number of missing values.

Introduction to Data

Missing Value Distribution

EDA

Column

Diagnosis

Deaths

Insurance

Average Diagnoses

Average Deaths

Income

Average Diagnoses

Average Deaths

Race

Average Diagnoses

Average Deaths

Correlation

Average Diagnoses

                        log_avganncount pctprivatecoveragealone
log_avganncount              1.00000000               0.3289180
pctprivatecoveragealone      0.32891796               1.0000000
pctempprivcoverage           0.37937251               0.9297679
pctpubliccoveragealone      -0.15487012              -0.8562727
medincome                    0.34756921               0.7891926
povertypercent              -0.21782367              -0.7604732
pctwhite                    -0.08490097               0.3070070
pctblack                     0.03612490              -0.2740153
pctasian                     0.38219128               0.2843640
                        pctempprivcoverage pctpubliccoveragealone  medincome
log_avganncount                  0.3793725             -0.1548701  0.3475692
pctprivatecoveragealone          0.9297679             -0.8562727  0.7891926
pctempprivcoverage               1.0000000             -0.7344268  0.7540889
pctpubliccoveragealone          -0.7344268              1.0000000 -0.7195812
medincome                        0.7540889             -0.7195812  1.0000000
povertypercent                  -0.6846717              0.7981458 -0.7866990
pctwhite                         0.2689122             -0.3665684  0.1633035
pctblack                        -0.2418814              0.3329023 -0.2643873
pctasian                         0.2882007             -0.1812346  0.4141817
                        povertypercent    pctwhite    pctblack    pctasian
log_avganncount             -0.2178237 -0.08490097  0.03612490  0.38219128
pctprivatecoveragealone     -0.7604732  0.30700697 -0.27401532  0.28436398
pctempprivcoverage          -0.6846717  0.26891221 -0.24188144  0.28820069
pctpubliccoveragealone       0.7981458 -0.36656842  0.33290227 -0.18123465
medincome                   -0.7866990  0.16330351 -0.26438728  0.41418170
povertypercent               1.0000000 -0.51104550  0.51769900 -0.14739052
pctwhite                    -0.5110455  1.00000000 -0.83069477 -0.27463578
pctblack                     0.5176990 -0.83069477  1.00000000  0.02231497
pctasian                    -0.1473905 -0.27463578  0.02231497  1.00000000

Average Deaths

                        log_avgdeathsperyear pctprivatecoveragealone
log_avgdeathsperyear              1.00000000               0.2146236
pctprivatecoveragealone           0.21462355               1.0000000
pctempprivcoverage                0.31328784               0.9297679
pctpubliccoveragealone           -0.01207791              -0.8562727
medincome                         0.27524653               0.7891926
povertypercent                   -0.08250517              -0.7604732
pctwhite                         -0.18305899               0.3070070
pctblack                          0.13999944              -0.2740153
pctasian                          0.42229093               0.2843640
                        pctempprivcoverage pctpubliccoveragealone  medincome
log_avgdeathsperyear             0.3132878            -0.01207791  0.2752465
pctprivatecoveragealone          0.9297679            -0.85627271  0.7891926
pctempprivcoverage               1.0000000            -0.73442681  0.7540889
pctpubliccoveragealone          -0.7344268             1.00000000 -0.7195812
medincome                        0.7540889            -0.71958117  1.0000000
povertypercent                  -0.6846717             0.79814584 -0.7866990
pctwhite                         0.2689122            -0.36656842  0.1633035
pctblack                        -0.2418814             0.33290227 -0.2643873
pctasian                         0.2882007            -0.18123465  0.4141817
                        povertypercent   pctwhite    pctblack    pctasian
log_avgdeathsperyear       -0.08250517 -0.1830590  0.13999944  0.42229093
pctprivatecoveragealone    -0.76047315  0.3070070 -0.27401532  0.28436398
pctempprivcoverage         -0.68467172  0.2689122 -0.24188144  0.28820069
pctpubliccoveragealone      0.79814584 -0.3665684  0.33290227 -0.18123465
medincome                  -0.78669897  0.1633035 -0.26438728  0.41418170
povertypercent              1.00000000 -0.5110455  0.51769900 -0.14739052
pctwhite                   -0.51104550  1.0000000 -0.83069477 -0.27463578
pctblack                    0.51769900 -0.8306948  1.00000000  0.02231497
pctasian                   -0.14739052 -0.2746358  0.02231497  1.00000000

Column

Analysis

The distribution of average annual cancer diagnoses and average annual cancer deaths are both skewed right, indicating taking the log will be useful. For the rest of this analysis, I used the log of average annual cancer diagnoses and the log of average annual cancer deaths.

Insurance has a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with private and employee-provided coverage having a positive correlation, and public coverage having a negative correlation.

Median income and poverty percentage have a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with the former having a positive correlation, and the latter having a negative correlation.

Race has a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with the percentage of the White population having a negative correlation, and the percentage of Black and Asian populations having a positive correlation.

The correlation coefficients confirm these observations.

Methods

Column {data-width = 500}

Data Cleaning

As mentioned previously, data cleaning was performed to remove missing observations from pctprivatecoveragealone and pctemployed16_over, and to completely remove pctsomecol18_24. Furthermore, incidencerate and and target_deathrate were removed since they represent aspects of cancer frequency or mortality already shown in my response variables (avganncount and avgdeathsperyear). Similarly, log(avgdeathsperyear) was removed from the model for log(avganncount), and vice versa. Before fitting the models, binnedinc, geography, statefips, and countyfips were removed since they are non-numerical and non-quantitative values. Finally, since the histograms of avganncount and avgdeathsperyear displayed a skewed right distribution, I used the log of both response variables to make the data better suited for linear regression.

Models Fit

Since the predictors are continuous, linear regression was used, and the predictors were selected using backward variable selection.

  • multiple linear regression for log(avganncount)
  • multiple linear regression for log(avgdeathsperyear)

Modeling

Column

Average Cancer Diagnoses


Call:
lm(formula = log_avganncount ~ popest2015 + povertypercent + 
    studypercap + medianagemale + medianagefemale + percentmarried + 
    pctnohs18_24 + pcths18_24 + pcths25_over + pctbachdeg25_over + 
    pctunemployed16_over + pctprivatecoverage + pctprivatecoveragealone + 
    pctempprivcoverage + pctpubliccoveragealone + pctwhite + 
    pctblack + pctasian + pctmarriedhouseholds + birthrate + 
    avghouseholdsize, data = cancer_diagnoses)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2711 -0.6496 -0.0186  0.5772  3.7385 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)              2.402e+00  1.074e+00   2.236 0.025442 *  
popest2015               1.254e-06  7.481e-08  16.763  < 2e-16 ***
povertypercent          -6.722e-02  8.363e-03  -8.038 1.44e-15 ***
studypercap              7.249e-05  3.870e-05   1.873 0.061207 .  
medianagemale           -8.905e-02  1.279e-02  -6.965 4.27e-12 ***
medianagefemale          3.598e-02  1.366e-02   2.634 0.008502 ** 
percentmarried           3.386e-02  1.035e-02   3.273 0.001081 ** 
pctnohs18_24            -8.555e-03  3.402e-03  -2.514 0.011993 *  
pcths18_24              -4.714e-03  2.934e-03  -1.606 0.108346    
pcths25_over            -2.815e-02  5.720e-03  -4.921 9.24e-07 ***
pctbachdeg25_over        1.973e-02  9.012e-03   2.190 0.028634 *  
pctunemployed16_over     7.580e-02  9.786e-03   7.746 1.41e-14 ***
pctprivatecoverage       8.604e-02  1.038e-02   8.285  < 2e-16 ***
pctprivatecoveragealone -9.620e-02  1.360e-02  -7.076 1.96e-12 ***
pctempprivcoverage       7.073e-02  7.468e-03   9.471  < 2e-16 ***
pctpubliccoveragealone   8.708e-02  9.081e-03   9.590  < 2e-16 ***
pctwhite                 1.401e-02  3.510e-03   3.991 6.79e-05 ***
pctblack                 1.142e-02  3.360e-03   3.397 0.000692 ***
pctasian                 4.316e-02  1.160e-02   3.721 0.000204 ***
pctmarriedhouseholds    -6.061e-02  1.080e-02  -5.614 2.21e-08 ***
birthrate               -3.227e-02  1.144e-02  -2.822 0.004816 ** 
avghouseholdsize         4.366e-01  1.962e-01   2.225 0.026190 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.054 on 2310 degrees of freedom
Multiple R-squared:  0.4665,    Adjusted R-squared:  0.4616 
F-statistic: 96.18 on 21 and 2310 DF,  p-value: < 2.2e-16

Average Cancer Deaths


Call:
lm(formula = log_avgdeathsperyear ~ popest2015 + povertypercent + 
    medianagemale + pctnohs18_24 + pcths25_over + pctbachdeg25_over + 
    pctemployed16_over + pctunemployed16_over + pctprivatecoverage + 
    pctprivatecoveragealone + pctempprivcoverage + pctpubliccoveragealone + 
    pctwhite + pctblack + pctasian + pctmarriedhouseholds + birthrate + 
    avghouseholdsize, data = cancer_deaths)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.0593 -0.4928  0.0432  0.5655  2.3892 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)              4.008e+00  8.557e-01   4.684 2.98e-06 ***
popest2015               1.325e-06  6.027e-08  21.991  < 2e-16 ***
povertypercent          -8.044e-02  7.079e-03 -11.364  < 2e-16 ***
medianagemale           -6.800e-02  6.190e-03 -10.986  < 2e-16 ***
pctnohs18_24            -1.564e-02  2.667e-03  -5.864 5.17e-09 ***
pcths25_over            -6.803e-03  4.475e-03  -1.520 0.128524    
pctbachdeg25_over        5.250e-02  7.296e-03   7.196 8.32e-13 ***
pctemployed16_over      -1.749e-02  4.610e-03  -3.795 0.000152 ***
pctunemployed16_over     7.884e-02  8.304e-03   9.494  < 2e-16 ***
pctprivatecoverage       5.329e-02  8.348e-03   6.384 2.08e-10 ***
pctprivatecoveragealone -1.066e-01  1.118e-02  -9.528  < 2e-16 ***
pctempprivcoverage       8.397e-02  5.969e-03  14.069  < 2e-16 ***
pctpubliccoveragealone   7.903e-02  7.294e-03  10.835  < 2e-16 ***
pctwhite                 2.205e-02  2.821e-03   7.818 8.07e-15 ***
pctblack                 2.127e-02  2.677e-03   7.946 2.98e-15 ***
pctasian                 6.347e-02  9.391e-03   6.758 1.76e-11 ***
pctmarriedhouseholds    -2.684e-02  4.939e-03  -5.435 6.06e-08 ***
birthrate               -5.703e-02  9.185e-03  -6.209 6.30e-10 ***
avghouseholdsize         2.367e-01  1.311e-01   1.806 0.071057 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8516 on 2313 degrees of freedom
Multiple R-squared:  0.5845,    Adjusted R-squared:  0.5813 
F-statistic: 180.8 on 18 and 2313 DF,  p-value: < 2.2e-16

Column {data.width = 400}

Analysis

Conclusions

Column

Discussion

Limitations

Column

Future Directions

Other socioeconomic and demographic factors can be explored that were not present in this data set. For example, collecting data on the number of people who went through with treatment could correlate to the average number of cancer deaths per year. Furthermore, it would be important to know whether the number of people who went through treatment was related to the cost or availability of the treatment.

Additionally, although the predictors were significant, the low R-squared value indicates that there are other variables not in the data that could correlate with cancer deaths and diagnoses. Genetics and other biological factors could be significantly impacting health outcomes relating to cancer.

Author

Column

About the Author

My name is Alexa Neal and I am a current senior at the University of Dayton. I am pursuing a Bachelor of Science in Premedicine, and minors in Data Analytics, Medicine and Society, and Neuroscience. My projected graduation is May 2026, and I will be attending medical school in the Fall of 2026.

AI Acknowledgement

Works Cited

Park, S., & Fung, V. (2025). Health Care Affordability Problems by Income Level and Subsidy Eligibility in Medicare. JAMA network open, 8(9), e2532862. https://doi.org/10.1001/jamanetworkopen.2025.32862

Column

---
title: "Cancer Analysis"
output: 
  flexdashboard::flex_dashboard:
    theme: 
      version: 4
      bootswatch: lumen
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(tidyverse)
library(pacman)
library(dplyr)
library(DataExplorer)
library(car)
library(leaps)
library(MASS)
```

Introduction
===

Column {data-width=450}
---
### Background
**Title:** Socioeconomic and Demographic Impacts on Average Cancer Diagnosis and Deaths per Year in the US

**Author:** Alexa Neal

In this project, I wanted to explore how various non-biological factors are related to the average number of cancer cases diagnosed, and the number average cancer deaths in the US. [One study](https://pmc.ncbi.nlm.nih.gov/articles/PMC12455370/) found that individuals on Medicare with low incomes had greater challenges in affording their healthcare (Park et al. 2025), highlighting that factors other than genetics and biology can impact health outcomes. This also showcases how certain populations are vunerable to worse outcome due to lack of access to healthcare. Discerning these variables can be useful in understanding barriers to important medical resources and how to improve access. 

### Research Questions
To explore these concepts, I found two data sets on [Kaggle](https://www.kaggle.com/datasets/varunraskar/cancer-regression), one containing health-related information and the other containing demographic information, and combined them. I utilized exploratory data analysis and multiple linear regression models to understand what factors correlate with average number of cancer cases diagnosed and the average number of cancer deaths per year. My research questions were: 

1. Which socioeconomic and demographic factors are most strongly associated with the average number of cancer diagnoses per year across U.S. counties?
2. Which socioeconomic and demographic factors are most strongly associated with the average number of cancer deaths per year across U.S. counties?


Column {.tabset data-width=550}
---
### Variables of Interest

- avganncount: Average number of cancer cases diagnosed annually

- avgdeathsperyear: Average number of deaths due to cancer per year

- medincome: Median income in the region

- povertypercent: Percentage of population below the poverty line

- pctprivatecoveragealone: Percentage of population covered by private health insurance alone

- pctempprivcoverage: Percentage of population covered by employee-provided private health insurance

- pctpubliccoveragealone: Percentage of population covered by public health insurance only

- pctwhite: Percentage of White population

- pctblack: Percentage of Black population

- pctasian: Percentage of Asian population


### Other Variables

- target_deathrate: Target death rate due to cancer

- incidencerate: Incidence rate of cancer

- popest2015: Estimated population in 2015

- studypercap: Per capita number of cancer-related clinical trials conducted

- binnedinc: Binned median income

- medianage: Median age in the region

- pctpubliccoverage: Percentage of population covered by public health insurance

- pctotherrace: Percentage of population belonging to other races

- pctmarriedhouseholds: Percentage of married households

- birthrate: Birth rate in the region

- statefips: The FIPS code representing the state

- countyfips: The FIPS code representing the county or census area within the state

- avghouseholdsize: The average household size in the region

- geography: The geographical location, typically represented as the county or census area name followed by the state name

### Data Cleaning

Before performing any analysis, I used plot_intro() to visualize the data set and realized there were missing values. I utilized plot_missing() to see what columns these values were in. To clean up the data, I removed missing values from pctprivatecoveragealone and pctemployed16_over as they were variables of interest to me. I also completely removed pctsomecol18_24 due to the large number of missing values. 

#### Introduction to Data

```{r cleaning}
## reading + joining the data
household <- read.csv("C:/Users/write/OneDrive/Desktop/school/regression files/avg-household-size.csv")
cancer_reg <- read.csv("C:/Users/write/OneDrive/Desktop/school/regression files/cancer_reg (1).csv")
cancer <- left_join(cancer_reg, household)

plot_intro(cancer)
```


#### Missing Value Distribution

```{r missing value distribution}

plot_missing(cancer)

cancer <- cancer %>% 
  drop_na(pctprivatecoveragealone, pctemployed16_over) %>%  
  dplyr::select(-c(pctsomecol18_24))
```

EDA
===

Column {.tabset data-width=600}
---

### Diagnosis

```{r diagnosis histogram}
ggplot(cancer, aes(x = avganncount)) +
  geom_histogram(fill = "lightblue") + labs(title = "Distribution of Cancer Diagnoses",
                          x = "Average Diagnoses Per Year", y = "Frequency")
```

### Deaths

```{r death histogram}
ggplot(cancer, aes(x = avgdeathsperyear)) +
  geom_histogram(fill = "lightblue") + 
  labs(title = "Distribution of Cancer Deaths",
                          x = "Average Deaths Per Year", y = "Frequency")
```

### Insurance

#### Average Diagnoses

```{r insurance diagnosis}
cancer$log_avganncount <- log(cancer$avganncount)

pairs(~log_avganncount + pctprivatecoveragealone
      + pctempprivcoverage + pctpubliccoveragealone, data = cancer)
```

#### Average Deaths

```{r insurance death}
cancer$log_avgdeathsperyear <- log(cancer$avgdeathsperyear)

pairs(~log_avgdeathsperyear + pctprivatecoveragealone
      + pctempprivcoverage + pctpubliccoveragealone, data = cancer)
```

### Income

#### Average Diagnoses

```{r income diagnoses}
pairs(~log_avganncount + medincome + povertypercent, data = cancer)
```

#### Average Deaths

```{r income death}
pairs(~log_avgdeathsperyear + medincome + povertypercent, data = cancer)
```

### Race

#### Average Diagnoses

```{r race diagnoses}
pairs(~log_avganncount + pctwhite + pctblack + pctasian,
      data = cancer)
```

#### Average Deaths

```{r race death}
pairs(~log_avgdeathsperyear + pctwhite + pctblack + pctasian,
      data = cancer)
```


### Correlation

#### Average Diagnoses

```{r diagnoses correlation}
cor(cancer[,c("log_avganncount", "pctprivatecoveragealone", "pctempprivcoverage", "pctpubliccoveragealone",              "medincome", "povertypercent", "pctwhite", "pctblack", "pctasian")])
```

#### Average Deaths

```{r death correlation}
cor(cancer[,c("log_avgdeathsperyear", "pctprivatecoveragealone", "pctempprivcoverage", "pctpubliccoveragealone",              "medincome", "povertypercent", "pctwhite", "pctblack", "pctasian")])
```


Column {data-width=400}
---
### Analysis

The distribution of average annual cancer diagnoses and average annual cancer deaths are both skewed right, indicating taking the log will be useful. For the rest of this analysis, I used the log of average annual cancer diagnoses and the log of average annual cancer deaths.

Insurance has a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with private and employee-provided coverage having a positive correlation, and public coverage having a negative correlation.

Median income and poverty percentage have a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with the former having a positive correlation, and the latter having a negative correlation.

Race has a low correlation with both the log of cancer diagnoses and the log of cancer deaths, with the percentage of the White population having a negative correlation, and the percentage of Black and Asian populations having a positive correlation.

The correlation coefficients confirm these observations. 


Methods
===

Column {data-width = 500}
---
### Data Cleaning
As mentioned previously, data cleaning was performed to remove missing observations from pctprivatecoveragealone and pctemployed16_over, and to completely remove pctsomecol18_24. Furthermore, incidencerate and and target_deathrate were removed since they represent aspects of cancer frequency or mortality already shown in my response variables (avganncount and avgdeathsperyear). Similarly, log(avgdeathsperyear) was removed from the model for log(avganncount), and vice versa. Before fitting the models, binnedinc, geography, statefips, and countyfips were removed since they are non-numerical and non-quantitative values. Finally, since the histograms of avganncount and avgdeathsperyear displayed a skewed right distribution, I used the log of both response variables to make the data better suited for linear regression. 

### Models Fit 
Since the predictors are continuous, linear regression was used, and the predictors were selected using backward variable selection. 

- multiple linear regression for log(avganncount)
- multiple linear regression for log(avgdeathsperyear)

Modeling 
===

Column {.tabset data-width=600}
---

### Average Cancer Diagnoses

```{r}
cancer_diagnoses <- cancer %>%  
  dplyr::select(-c(binnedinc, geography, statefips, countyfips, 
                   avganncount, avgdeathsperyear, target_deathrate,
                   incidencerate, log_avgdeathsperyear))

full.cancer.diagnoses <- lm(log_avganncount ~ ., data = cancer_diagnoses)
fit.backward.diagnoses <- stepAIC(full.cancer.diagnoses, direction = "backward", trace = FALSE)
summary(fit.backward.diagnoses)
```

### Average Cancer Deaths

```{r}
cancer_deaths <- cancer %>%  
  dplyr::select(-c(binnedinc, geography, statefips, countyfips, 
            avganncount, avgdeathsperyear, target_deathrate,
            incidencerate, log_avganncount))

full.cancer.deaths <- lm(log_avgdeathsperyear ~ ., data = cancer_deaths)
fit.backward.deaths <- stepAIC(full.cancer.deaths, direction = "backward", trace = FALSE)
summary(fit.backward.deaths)
```


Column {data.width = 400}
---
### Analysis


Conclusions
===
Column {data.width=500}
---
### Discussion

### Limitations

Column {data.width=500}
---
### Future Directions
Other socioeconomic and demographic factors can be explored that were not present in this data set. For example, collecting data on the number of people who went through with treatment could correlate to the average number of cancer deaths per year. Furthermore, it would be important to know whether the number of people who went through treatment was related to the cost or availability of the treatment.

Additionally, although the predictors were significant, the low R-squared value indicates that there are other variables not in the data that could correlate with cancer deaths and diagnoses. Genetics and other biological factors could be significantly impacting health outcomes relating to cancer.

Author
===

Column {data-width=500}
---
### About the Author
My name is Alexa Neal and I am a current senior at the University of Dayton. I am pursuing a Bachelor of Science in Premedicine, and minors in Data Analytics, Medicine and Society, and Neuroscience. My projected graduation is May 2026, and I will be attending medical school in the Fall of 2026.

### AI Acknowledgement

### Works Cited
Park, S., & Fung, V. (2025). Health Care Affordability Problems by Income Level and Subsidy Eligibility in Medicare. JAMA network open, 8(9), e2532862. https://doi.org/10.1001/jamanetworkopen.2025.32862

Column {data-width=500}
---
###

```{r headshot}
knitr::include_graphics("C:/Users/write/OneDrive/Desktop/photos/headshot.JPG")
```